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ABSTRACT 

Plant MADS-domain transcription factors act as key 
regulators of many developmental processes. 
Despite the wealth of information that exists about 
these factors, the mechanisms by which they recog- 
nize their cognate DNA-binding site, called 
CArG-box (consensus CCWeGG), and how different 
MADS-domain proteins achieve DNA-binding speci- 
ficity, are still largely unknown. We used information 
from in vivo ChlP-seq experiments, in vitro DNA- 
binding data and evolutionary conservation to 
address these important questions. We found that 
structural characteristics of the DNA play an import- 
ant role in the DNA binding of plant MADS-domain 
proteins. The central region of the CArG-box largely 
resembles a structural motif called 'A-tract', which 
is characterized by a narrow minor groove and 
may assist bending of the DNA by MADS-domain 
proteins. Periodically spaced A-tracts outside the 
CArG-box suggest additional roles for this structure 
in the process of DNA binding of these transcription 
factors. Structural characteristics of the CArG-box 
not only play an important role in DNA-binding site 
recognition of MADS-domain proteins, but also 
partly explain differences in DNA-binding specificity 
of different members of this transcription factor 
family and their heteromeric complexes. 

INTRODUCTION 

The MADS-domain is a conserved DNA-binding domain 
present in a eukaryote-wide family of transcription factors 



(TFs). MADS-domain proteins typically contact their 
cognate binding site, the CArG-box (consensus: 
CCWgGG) as dimers (1). Structural analysis of animal 
and yeast MADS-domain protein dimers revealed that 
central parts of their MADS-domains form an antiparallel 
coiled-coil, made of two amphipathic a helices — one from 
each subunit. This coiled coil lies flat on the DNA minor 
groove (2). The N-terminal regions penetrate into the 
minor groove and stabilize bending of the DNA. The 
C-terminal part of the MADS-domain forms p-sheets 
that allow protein dimerization (2-4). 

The family of MADS-box genes has dramatically 
expanded during plant evolution, and in particular in 
flowering plants (5). Two major classes of MADS-domain 
proteins can be distinguished: type I proteins, which are a 
heterogeneous group of proteins having only the MADS- 
domain in common, and type II proteins, which have a 
highly conserved modular domain architecture (5). In 
type II proteins, which are also called MlKC-type 
proteins, the MADS-domain ('M')is followed by an 
intervening (T') domain, which is predicted to form an a 
helix and contributes to the selection of dimer partners (6). 
After the I-domain a keratin-like ('K') domain is located, 
which, presumably, assembles into coiled-coil structures 
enabling dimer and higher-order complex formation. The 
K-domain is followed by a highly variable C-terminus that 
has roles in transcriptional regulation (7). MlKC-type 
genes function as master regulators of developmental 
phase transitions, meristem and floral organ specification. 
Their encoded proteins function together in a combinator- 
ial manner, as they interact with each other forming 
heterodimers and higher-order molecular complexes 
(8-11) [for review, see (12)]. 

The function of each MADS-domain protein complex 
is presumably achieved by regulating partly different sets 
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of target genes through specific binding to their DNA 
regulatory elements. Although the CArG-box motif is 
the common DNA-binding consensus sequence of the 
MADS-domain TF family, several variants of the 
CArG-box exist that differ in length of the A/T-rich 
region in the central part of the motif and still can be 
considered as MADS-domain TF binding sites (13). 
However, the main /« vivo determinants of MADS- 
domain TF binding site recognition and their DNA- 
binding specificity remain enigmatic. To understand the 
various important and specialized roles of MADS- 
domain TFs in plant development, it is essential to 
understand the mechanisms of the DNA-binding site 
recognition by this diverse family of TFs. 

The identification of in vivo DNA-binding events of 
MADS-domain TFs at genome-wide scale provides 
novel opportunities to study parameters and factors 
influencing DNA-binding site recognition. Chromatin 
immunoprecipitation followed by deep sequencing 
(ChlP-seq) or hybridization to tihng arrays (ChlP- 
CHIP) has allowed to generate genome-wide binding 
maps of several MADS-domain TFs involved in floral 
transition (14,15) and flower development (16-18). 
Especially a study on the floral MADS-domain TF 
SEPALLATA3 (SEP3), which acts as a mediator of 
higher-order interactions among floral MADS-domain 
proteins, has revealed that the CArG-box consensus 
sequence (CCWgGG) has only poor predictabihty for 
DNA-binding 'in planta' (17): only 7.7% of all perfect 
CArG-boxes are bound by SEP3, and only 17% of the 
SEP3 binding events identified contain a perfect CArG- 
box consensus. This indicates that the perfect CArG-box 
consensus is not an optimal definition for the in vivo DNA 
binding of MADS-domain proteins 

In this article, we analyze the structural properties of 
DNA regions bound by specific MADS-domain TFs to 
unravel DNA sequence determinants affecting their 
binding affinity. Our results show that regions bound by 
MADS-domain TFs have a tendency to display particular 
structural properties, and that these structural properties 
may play a role in determining the DNA-binding specifi- 
city of different MADS-domain protein dimers. In par- 
ticular, our results show that certain structural elements 
called A-tracts facilitate MADS-domain TF DNA binding 
when located inside the CArG-box motif and periodically 
distributed around it. 



MATERIALS AND METHODS 

Bioinformatic analysis of ChIP experiments 

ChlP-seq data sets for SEP3 (17), API (16) and FLC (19), 
and ChlP-chip data sets for SVP (14) and SOCl (14) were 
re-analyzed in this study. For ChlP-seq experiments, 
sequence reads were mapped to the Arabidopsis thaliana 
(TAIR9) genome using SOAPv2 (20). Reads mapped to 
multiple regions or to the mitochondria or chloroplast 
genome were discarded. We modified the R package 
CSAR (21) to generate read-enrichment score values at 
each single-nucleotide position, without performing peak 
calhng. This score represents the ratio between density of 



reads overlapping a given nucleotide in the IP sample 
versus the control sample after normalization. For 
ChlP-chip experiments, probe sequences were remapped 
to the TA1R9 Arabidopsis genome with the Starr package 
(22). Only probes that mapped to unique locations were 
retained. Subsequently, CisGenome (23) was used to 
detect potential binding regions, using the hidden 
Markov model to combine intensities of neighboring 
probes. In this case, the score value ranges between 0 
and I, where 1 is the most significant. 

Subsequently, aU CArG-box motifs (CCWeGG) were 
located in the TA1R9 genome. We used three definitions 
for the CArG-box consensus: (i) perfect CArG-box 
(CCWfiGG); (h) long CArG-box (CCWyG); and 
(iii) short CARG-box (CCW4S2GG). No mismatches 
were allowed. Afterwards, instead of performing a peak 
calhng step directly on the ChlP-seq and ChlP-chip data, 
we defined regions 250 bp around each 10 bp motif (510 bp 
in total) and we assigned them a ChIP score with the 
maximum ChlP-seq or ChlP-chip score in that region. 
For ChlP-seq analysis, a ChlP-seq threshold was 
calculated for false discovery rate (FDR) < 0.05 using 
the function 'perniutatedWinScores' from the package 
CSAR. This threshold was used to define a set of bound 
and unbound regions. We defined regions bound by SEP3 
in 'wild-type' but not in agamous mutant, as these regions 
with a SEP3 (wt) ChlP-seq score >4.15 (FDR < 0.05) 
and a SEP3 [ag mutant) ChlP-seq score <1. Scores of 
<1 indicate that the normalized number of mapped 
reads in the control sample is equal to or larger than in 
the IP sample. 

Analyzing DNA structural properties 

Dinucleotide properties (73 in total) were obtained from 
the DiProDB database (24). They were used to estimate 
several properties of the DNA at each dinucleotide step. 
From these properties, we calculated average differences 
between the set of regions identified as bound by SEP3 in 
our ChlP-seq analysis (FDR < 0.05) and the set of regions 
identified as SEP3 unbound. 

A-tract elements were defined with the motif A^Tn, 
where n + m>3. The length of a consecutive stretch of 
A followed by T was counted in both cases as the 
maximum n + m in the consensus A^^T^. 

DNA conservation studies 

The ahgned DNA sequences of 81 A. thaliana accessions 
were obtained from the 1001 genome project (http://www. 
100Igenomes.org/; release 5 December 2010). We 
associated CArG-box motifs with the SEP3 ChlP-Seq 
score in the accession Col-0, and we extracted their 
corresponding sequence in the other accessions. Only 
sequences where aU nucleotides has been identified were 
considered; sequences containing Ns were removed from 
the analysis. These regions were classified depending on 
the presence or absence of an A-tract element in the Col-0 
accession. For the conservation analysis, only CArG-box 
regions that have, at least, one SNP in one ecotype 
compared with Col-0 on their 10 bp sequences were 
considered. The proportion of CArG-box regions with 
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conserved A-tract element length was calculated as the 
ratio between the number of CArG-box regions at a 
given ChlP-seq score threshold that have exactly the 
same length for the A-tract element in all ecotypes con- 
sidered divided by the total number of CArG-box regions 
at that ChlP-seq score threshold. 

Quantitative multiple fluorescence relative affinity 
(QuMFRA) 

QuMFRA experiments were performed as described pre- 
viously (25). Ohgonucleotide sequences used in the 'AG 
intron' experiments (Figure 4 and Supplementary Figure 
S8) were derived from the first intron of the AG locus and 
contained a single CArG-box with an A-tract element 
of length four inside. The probe 'AG wt' has the 
sequence: 5'-TA TA TA TA TT(CCAAA TAAGG)AAAGTA 
TGGA. The probe 'AG mut' represent the same sequence, 
but the A-tract element inside of the CArG-box was elimi- 
nated by the substitution of ApA or ApT steps by TpA, 
exactly, it has the sequence: 5'-TATATATATT(CCTA 
TATAGGJAAAGTATGGA. CArG-box sequences are rep- 
resented in bold, and substitutions are underlined. 

Ohgonucleotide sequences used in the SOCl promoter 
studies (Supplementary Figure S7) were derived from the 
SOCl promoter and contained two CArG-boxes [CArG 
111: -96 bp and CArG-box IV: -125 bp as described by 
Immink et al., 2012 (15)] separated by four A-tract 
elements. Probe 'SOCl wt' has the sequence 5'-TTG(CT 
ATTTTTGG)TCCCTCGGATTACTAAAGAAAACGTA 
ACTTAGAAA TCCAATAA TAA TTCAGCTTA TCGAAC 
GTCTTGTCTAGCTAGTGGCACCAAAAAAATATfCC 
TTTTTTGG)AGA, and probe 'SOCl mutl represents the 
same sequence but the four A-tract elements were 
eliminated by the substitution of ApA or ApT steps by 
TpA. A-tract elements inside of the two CArG-boxes 
were not modified. Exactly, the sequence is 5'-TTG(CTA 
TTTTTGG)TCCCTCGGATTACTAAAGATATCGTAA 
CTTAGATA TCCAA TAA TATATCAGCTTA TCGAACG 
TCTTGTCTAGCTAGTGGCACCATATATA TA T(CCT 
TTTTTGG)AGA. CArG-boxes are indicated within 
parentheses, and bold and mutated nucleotides are 
underlined and bold. 

Single-stranded DNA ohgonucleotides were commer- 
cially synthesized, annealed and inserted into pGEM-T 
vector (Promega). Double-stranded DNA (dsDNA) frag- 
ments were amplified by PCR with infrared 5'-fluorescent- 
labeled (Dy682 or Dy782) primers specific for pGEM-T 
vector, gel-purified and their concentration was measured. 
Electrophoretic mobihty shift assays (EMSAs) were 
performed as described previously (11), with 2\A of 
in vitro synthesized proteins (TNT Coupled Wheat Germ 
Extract, Promega) with an equimolar (75fmol each) 
mixture of two different dsDNA sequences each labeled 
with a different IR-fluorophore. Both the protein-DNA- 
binding reaction and the EMSA were performed in 
temperature-controlled environments at 4°C (cold 
room), 16°C (water bath) and 25°C (incubator). Low 
voltage of the electrophoresis run (75 V/6.8 cm gel) was 
appHed to avoid temperature change within the 
gel-running chamber during the run. EMSA gels were 



scanned with Odyssey Infrared Imaging System (Li-Cor) 
and the band shift signals were quantified using Odyssey 
Software vl.2 (Li-Cor) taking 'Integrated Intensity' (I.I.) 
parameter for further quantification process. Relative 
binding affinity [Ki,(Di)IKi,(D2)] of the protein complex 
considered to probe 7 (Dj) compared with probe 2 (D2) 
was calculated as described previously (25) using the equa- 
tion Kt(Dj)/K,(D2) = ([P-D,]*[D2])I([P-D2]*[D,]), 
where [P-DJ is estimated as the intensity of the bound 
dsDNA probe / (D,), and [D,] is estimated as the intensity 
of the free dsDNA probe / (D,) within a single EMSA lane 
after background noise subtraction. The relative binding 
affinity was measured based on six independent QuMFRA 
replicates for the AG intron element measurements and 
four replicates for the SOCl promoter sequence element. 
For both experiments, half of the replicates were done 
with probe 1 labeled with Dy682 and probe 2 with 
Dy782, and the other half with probe 1 labeled with 
Dy782 and probe 2 with Dy682. 

RESULTS 

CArG-boxes bound by SEP3 complexes are defined by 
particular DNA structural properties 

To understand the specificity of SEP3 DNA binding, we 
identified patterns of DNA structural properties common 
to a set of 'functional' CArG-boxes as identified by SEP3 
binding (FDR < 0.05) (17). We focussed on SEP3 because 
of its ability to form complexes with several other M ADS- 
domain TFs and, therefore, to give a broad picture of the 
MADS-domain TF binding events. To do so, we 
estimated DNA structural properties, as defined in the 
dinucleotide property database [DiProDB (24)], for each 
dinucleotide step of regions around all (7741) CArG-boxes 
(CCWgGG) in the Arabidopsis nuclear genome. Figure lA 
shows a heatmap representing regions with different struc- 
tural properties obtained by comparing CArG-box 
regions bound (FDR < 0.05) versus not bound by SEP3 
at each dinucleotide position using a f-test statistic. The 
three most central dinucleotides and the flanking regions 
of the CArG-box sequence showed the highest differences 
when comparing CArG-boxes that are bound versus non- 
bound by SEP3, indicating that the structural properties 
of these locations are important for binding (Figure lA). 
To identify which properties show the best correlation 
with the SEP3 ChlP-score, we related the average 
property value over the 10-bp CArG-box sequence with 
its associated SEP3 ChlP-seq score threshold value. We 
observed the strongest correlation with the 'mobility to 
bend toward the minor groove' (|i) property (r = 0.69; 
pv< 10~^'; Figure IB), which measures the abihty of the 
DNA to be bent toward the minor groove by the 
Escherichia coli catabolite activator protein measured as 
the relative complex gel mobility ((i) (26). In addition, 
among the 10 structural properties with the strongest 
correlations to the SEP3 ChlP-seq score threshold, we 
found 'minor groove width' (A) (r = —0.56; pv< 10~^'; 
Figure IC) and DNA 'minor groove depth' (A) 
(r = 0.55; pv < 10~^'). In summary, properties of the 
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Figure 1. DNA structure properties of CArG-box regions bound by SEP3. (A) Heatmap showing which DNA properties at which location have 
significantly higher (— loglO /"-value; blue color) or lower values (loglO P-value; red color) in the CArG-box regions bound by SEP3 (FDR<0.05) 
compared with CArG-box regions unbound by SEP3 using a /-test statistic. The three most central dinucleotides of the CArG-box and its flanking 
regions show the highest differences. The properties 'mobility to bend toward the minor groove' (B) and 'minor groove width' (C) are among the 
properties with the highest correlations with SEP3 ChlP-seq score. For panel (B) and (C), values are only plotted till such a ChlP-seq score threshold 
where the average property is calculated from at least 50 CArG-box regions. 



DNA groove and degree of bending seem to correlate with 
the SEP3 binding event. 

A-tract elements are overrepresented in SEP3-bound 
CArG-box sequences 

The structural properties of functional CArG-boxes 
(CCWgGG) that were detected in our analysis show 
striking similarities with the properties of DNA elements 
known as A-tracts. A-tracts have been defined as 4-8 con- 
secutive A*T base pairs without a TpA step (27). The 
consensus of one A-tract element can be described with 
the motif: NjAmTnNj, where m+n > 3 and the total length 
of the motif being 10 bp. DNA regions containing 
in-phase A-tract repeats show a narrower minor groove 
width and higher bendabihty toward the minor groove 
than other AT-rich regions (27). 

Because of their structural and sequence similarities, we 
studied how the presence of A-tracts in the CArG-box 
region relates with the binding of SEP3. Figure 2 shows 
that the normalized proportion of DNA regions contain- 
ing an A-tract (m + n>3) inside the 10 bp CArG-box 
sequence increases with the ChlP-seq score threshold 
used, supporting the idea of its positive dependency. 



In contrast, the proportion of regions without an A-tract 
inside the CArG-box (ni + n < 4) tends to decrease with 
the threshold used. In particular, for SEP3 wt ChlP-seq 
(Figure 2A), the Pearson correlation {r) was —0.96 
(pv<2x 10""^), -0.94 (pv<2x 10"'*), 0.50 (pv<2x 
10""^), 0.02 (pv<0.81), 0.66 (pv<2x 10""^) for A-tract 
length of 2-6, respectively. When we eliminate from this 
data set the binding events that are also present in the 
SEP3 ag mutant ChlP-seq experiment, we expect to have 
an enrichment of binding events of complexes formed 
mainly by AG and SEP3. This allows to investigate if 
there is a different pattern of A-tract length enrichment 
depending on the type of SEP3 MADS-domain complex 
(Figure 1). Because of the large overlap of these two data 
sets, the subtraction of common binding sites will decrease 
the range of score values of the binding sites considered, 
and this is also the reason why several enrichment curves 
of Figure 2B do not reach the FDR < 0.05 threshold. The 
A-tracts of length 4 (pv < 0.06; hypergeometric test) and 6 
(pv< 1.3 X 10~^; hypergeometric test) for SEP3 'wt' and 
of length 4 (pv < 0.006; hypergeometric test) for SEP3 
binding events not present in ag mutant showed the 
highest enrichment at the threshold level of FDR < 0.05 
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Figure 2. Enrichment of A-tract elements in SEP3-bound CArG-box sequences. The proportion of CArG-box motifs with a particular A-tract 
element inside normalized by the proportion of CArG-boxes with each particular A-tract element at genome-wide level and plotted against the ChlP- 
seq score threshold used, for (A) SEP3 ChlP-seq; and (B) SEP3 ChlP-seq regions that loose the binding event in the ag mutant. The figure shows an 
increase in the normalized proportion of CArG-box sequences containing an A-tract (m+n > 3) with the ChlP-seq score used. In contrast, the 
proportion of CArG-box motifs without an A-tract {m+n<4) decreases with the ChlP-seq score. Values are only plotted until a ChlP-seq score 
where there are at least 15 CArG-boxes to calculate the ratio. Dashed hne indicates the SEP3 ClilP-seq threshold value for FDR < 0.05. 



in the SEP3 ChlP-seq experiments studied. A similar 
pattern of enrichment for A-tracts was also observed for 
other MADS-domain TF ChlP-seq and -chip experiments 
and alternative definitions of the CArG-box consensus 
(Supplementary Figure Sl-3). In particular, for FLC, 
API, SOCl and SCP the A-tract element of length six 
was the one most strongly enriched. 

The flanking regions of CArG-boxes (CCWgGG), 
defined with an arbitrary length of 250 bp at each side (for 
a total of 5 10 bp), bound by SEP3 were also characterized by 
a higher presence of A-tract elements than the flanking 
regions of non-bound CArG-box regions (Figure 3A). 
This overrepresentation is not due to a different AT 
content, as when we eliminated the A-tract sequences 
from the studied regions the AT-content was almost identi- 
cal (Supplementary Figure 84). Furthermore, we used a 
Fisher's exact g-test (28) to test periodicity of the location 
of the A-tract elements for each single 510 bp CArG-box 
containing region. We found that 88% of the SEP3-bound 
(FDR < 0.05) CArG-box surrounding regions showed a sig- 
nificant periodicity on the location of A-tract elements 
(pv < 0.05), whereas the percentage for unbound regions 
containing CArG-boxes was only 67%. The distribution 
of the P-values for the g-test of bound and unbound 
regions was markedly different (/-test; pv<10~'^) 
(Supplementary Figure S5). Moreover, we studied the 
relative location distribution of the A-tracts elements to 
the middle position of the CArG-box sequence (Figure 3B 
and Supplementary Figure S6) and the estimated dominant 
A-tract location periodicity for the 510 bp regions bound by 
SEP3 was found to be 22.1 bp. This distance was estimated 



as the average distance (1/dominant frequency) for each 
SEP3-bound (FDR < 0.05) CArG-box region that shows 
a significant (pv < 0.05) periodicity. 

The bioinformatics analyses suggest a role for periodic- 
ally distributed A-tract elements in DNA binding of 
MADS-domain protein complexes. It is possible that the 
flanking regions of CArG-box sequences may facilitate the 
looping of the DNA by higher-order complexes of 
MADS-domain proteins. Therefore, next, we experimen- 
taUy studied the importance of A-tract elements for 
MADS TF/DNA complex formation. The SOCl 
promoter contains two CArG-box sequences where 
SEP3 is able to bind (15). Each CArG-box sequence 
contains one A-tract element and they are separated by 
four A-tract elements. We studied the affinity of SEP3 to a 
probe representing this region compared with a probe rep- 
resenting the same region but with the A-tract elements 
between the two CArG-boxes mutated by substitution of 
the ApA or ApT steps by TpA. The relative affinity of the 
SEP3 homodimer seemed to be shghtly affected by the 
elimination of the A-tract elements (1.4-fold change for 
the unmutated probe compared with the mutated; 
standard error 0.07 over 4 replicates; Supplementary 
Figure S7). However, the relative affinity of a SEP3 
higher order complex to the unmutated probe had an 
increase of 3.6-fold (standard error 0.26 over 4 replicates; 
Supplementary Figure S7) when compared with the 
mutated probe, indicating that the location of A-tract 
elements between the two CArG-boxes on the SOCl 
promoter facilitates the formation of the SEP3 tetramer 
(or higher-order protein)-DNA complexes. 
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Figure 3. Multiple A-tracts in SEP3-bound CArG-box regions. (A) Distribution of multiple A-tracts {ni+n>3) elements within the 250-bp region at 
either side of the CArG-box motif (510-bp region in total) bound by SEP3 or not bound by SEP3. (B) Proportion of CArG-box regions with an A- 
tract element in a particular position. A moving average of length 5 bp was applied to obtain a more smooth representation of the data. Regions 
with a SEP3 ChlP-seq binding event (FDR < 0.05) are indicated in green, and regions without a binding event are indicated in red. Dashed lines are 
located each 11 bp from the middle of the CArG-box motif, representing a helical turn. For illustrative reasons only the region —60 to 60 bp is 
shown, for the 510 bp region see Supplementary Figure S6. 



A-tract DNA curvature plays a role in the DNA-binding 
specificity of MADS-domain proteins 

Our analysis of ChlP-seq data presented previously 
suggests the importance of A-tracts for DNA binding by 
MADS-domain proteins. Because A-tract length is related 
with the degree of curvature of the DNA region where it is 
located and because several MADS-domain protein 
homo- and heterodimers bend the DNA in vitro at differ- 
ent degrees (29,30), we analyzed the in vivo preference 
of MADS-domain protein complexes within CArG-box 
sequences with different A-tract length. ChlP-seq experi- 
ments identify the binding regions of a set of protein 
complexes targeted by the used antibody. To narrow the 
specificity to particular protein complexes, one can 
compare ChlP-seq experiments in mutants lacking some 
of the potential protein binding partners. In such a way, 
DNA regions detected by the SEP3 ChlP-seq experiment 
in wild-type (wt) but not in the 'agamous' (ag) mutant (17) 
are expected to be mainly bound by protein complexes 
containing SEP3 and AG. These DNA regions are 
enriched in CArG-boxes with an A-tract of length 4 
(Figure 2B), in contrast to the preferences of length 4 
and 6 in the wt ChlP-seq experiment (Figure 2A). These 
results indicate that some MADS-domain protein 
complexes, e.g. the SEP3-AG heterodimer, have a prefer- 
ence for CArG-boxes with particular A-tract properties. 

DNA curvature of regions containing A-tract elements 
strongly depends on the temperature. Koo et al. (31) 
found a decrease in bending magnitude when passing 
from 4°C to room temperature, and Diekmann et al. 
(32) revealed that the decrease with temperature is 



monotonic. This property enables us to modulate the 
DNA curvature of the same DNA sequence fragments, 
and, therefore, to experimentally study the importance 
of DNA curvature in the DNA-binding affinity and spe- 
cificity of MADS-domain proteins. We used QuMFRA 
experiments at different temperatures to estimate the 
relative affinities of three MADS-domain protein combin- 
ations (only SEP3, SEP3 and AG, and only AG) to a 
probe representing the AG intron compared with a 
probe representing the AG intron where the A-tract 
element inside the CArG-box region was mutated by the 
introduction of TpA steps. We chose these three combin- 
ations because our analysis (Figure 2A and B) showed that 
the SEP3-AG heterodimer has a preference to A-tract 
elements of different length than other SEP3 complexes, 
and we also added the AG-AG homodimer to be sure that 
the temperature-dependent changes in affinities were not 
only due to a temperature-dependent change in the pro- 
portion of homodimer/heterodimer formed on the mix. 
The results of the QuMFRA experiments (Figure 4, 
Supplementary Figure S8) show that the elimination of 
the A-tract element decreases the affinity of three 
MADS-domain protein/DNA complexes compared with 
the 'wt' sequence in 2-16-fold depending on the dimer 
and condition considered. This supports our hypothesis 
of the importance of A-tract elements inside the CArG- 
box sequence to facihtate DNA binding. Strikingly, their 
relative affinity changed with the temperature (Figure 4). 
Although the DNA binding of the AG homodimer is rela- 
tively independent of the temperature, the relative affinity 
of the SEP3 homodimer and SEP3-AG heterodimer 
depends more strongly on the temperature. 
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Figure 4. Temperature-dependent DNA affinity of MADS-doniain 
complexes. Relative binding affinity of three MADS-domain complexes 
to a probe representing the AGAMOUS intron relative to the affinity to 
a probe representing the same region but with the A-tract element 
inside of the CArG-box mutated (see "Materials and Methods' 
section). The relative affinity was studied at different temperatures by 
QuMFRA experiments. Error bars indicates standard error calculated 
out of six replicates. Supplementary Figure S8 shows the images of gels 
of two repUcates from which these affinities were calculated. 



A-tract length in SEP3 binding sites is conserved among 
Arahidopsis ecotypes 

To further assess the functional importance of A-tract 
length within the central CArG-box core sequence, we 
analyzed DNA sequence conservation. The proportion of 
10 bp Col-0 CArG-box sequences with conserved length of 
their A-tract among the 8 1 sequenced Arabidopsis ecotypes 
(1001 genome project) is higher in regions bound by SEP3 
TF complexes than in CArG box sequences without 
SEP3 binding (Figure 5; Pearson correlation r = 0.97; 
pv < 2 X 10~' ). In contrast, the proportion of CArG-box 
sequences with conserved length of consecutive A and T 
base pairs for non A-tracts (m + n < 4) decreases with the 
SEP3 ChlP-seq score (Pearson correlation /- = —0.52; 
pv< 0.012). This supports not only the functionahty of 
the A-tract inside the CArG-box sequence but also the im- 
portance of its length. 

DISCUSSION 

The 10 bp DNA sequence motif known as CArG-box 
represents the DNA-binding consensus of MADS- 
domain TFs. Previous studies focused on the characteriza- 
tion of the primary DNA sequence of this binding site, 
largely omitting the importance of structural properties of 
the DNA. Since the first structural characterization of the 
DNA-binding domain of an animal MADS-domain TF in 
1995 (2), it has been suggested that this family of TFs 
binds DNA by the interaction of their amino acids 
mainly with the minor groove side of the DNA. This 
type of recognition usually relies on structural properties 
of the DNA more than a specific sequence of DNA 
bases (33). Here, we studied the importance of the DNA 
structure as a determinant in the DNA recognition and 
specificity of MADS-domain TFs. 

We studied a set of 73 DNA properties as potential 
factors that can influence the binding of plant SEP3 TF 
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Figure 5. Conservation of the A-tract length in functional CArG-box 
regions. The average proportion of CArG-box motifs with conserved 
length of the motif An,T„ among the 81 A. thaliana ecotypes (see 
'Material and Methods' section) are shown as a function of the SEP3 
ChlP-seq score threshold. Green, A-tract element with length 4-6; and 
red, AT-regions with length 2-3 (non-A-tract elements). Only CArG- 
box inotifs with at least one SNP compared with Col-0 inside the 10 bp 
CArG-box region in at least one ecotype are considered. Proportions 
are only plotted to such threshold level that at least 15 CArG-box 
sequences are considered. Vertical dashed line indicates the threshold 
score value corresponding to FDR < 0.05. 



complexes. Among the most significant properties 
associated with functional CArG-boxes were those 
related to the minor DNA groove and bendability of 
DNA. Genomic regions bound by SEP3 complexes were 
also found to be associated with the presence of periodic- 
ally distributed A-tracts elements. These elements are 
known to confer a particularly high level of curvature 
and narrow minor groove width to the DNA regions 
where they are periodically located. Interestingly, 
previous in vitro studies have shown that some MADS- 
domain TFs are able to bend the DNA at different degrees 
[e.g. 53° by API, 70° by AG; (29)]. We hypothesize that 
the affinity of MADS-domain TFs could be related 
with the energy needed to modify the DNA conformation 
to the one observed on binding, and therefore, DNA- 
binding affinity wiU depend on a priori structural 
properties of the DNA. This mechanism of DNA- 
binding recognition has been already proposed for the 
human protein NF-kB (34), where DNA bending in the 
binding site of this factor in the bound state is similar to 
the bending already present in its free state. This particular 
bent conformation seems to be faciUtated by the presence 
of A-tract elements. Suggesting a similar mechanism for 
the MADS-domain TF binding event, we found a positive 
association of A-tracts inside CArG-box sequences and 
in vivo MADS-domain TF binding. Our in vitro experi- 
ments on a sequence representing the AG intron also 
support the importance of A-tract elements inside of the 
CArG-box motif, as its presence increases the relative 
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affinity of SEP3 complexes between 2-16-fold, depending 
on the dimer and temperature considered. Additionally, 
MADS-domain TFs can form quaternary protein 
complexes that loop the DNA around two CArG-box 
elements (8,9,11,35-37). We hypothesize that our 
observed periodicity of A-tracts in CArG-box flanking 
regions could be associated with the need of looping the 
DNA by higher order complexes in vivo. In fact, our 
in vitro experiment on the SOCl promoter shows that 
the elimination of A-tract elements between two CArG- 
boxes with the simple substitution of ApA or ApT steps 
by TpA decreases the in vitro binding of SEP3 higher 
order complexes to this sequence. This result supports 
the hypothesis that A-tract elements in flanking regions 
may facilitate the looping of the DNA on binding of 
some MADS-domain TFs (i.e SEP3), and it is tempting 
to speculate that they may contribute to the DNA-binding 
specificity of higher order complexes. 

Because several MADS-domain protein dimers are able 
to bend the DNA at different degrees, this structural 
property can play a role in the specificity of different 
dimers. For example, the mammalian MADS-domain 
factor Myocyte enhancer factor 2A (MEF2A), which 
hardly induces DNA bending, has the consensus binding 
motif CTAW4TAG, whereas the serum response factor, 
with the standard CCWeGG consensus binding motif 
(13,38), induces a dramatic DNA bending on binding. In 
fact. West and Sharrocks (39) already speculated about a 
possible hnk between DNA-bending and DNA-binding 
specificity of MADS-domain TFs. By exploiting the 
temperature-dependent curvature of A-tract elements, we 
obtained confirmation for this hypothesis. Changing the 
temperature will not modify the primary DNA sequence, 
but it will affect the curvature of the DNA containing an 
A-tract (31,40). We observed that the relative in vitro 
affinity of the SEP3 and SEP3-AG dimers changes with 
the temperature, supporting the influence of the DNA 
curvature in the in vitro DNA-binding specificity of 
these two dimers. Meanwhile the affinity of AG 
honiodimers only shows minimal changes. Additionally, 
we found that DNA regions bound by different SEP3 
dimers in vivo show an overrepresentation of A-tracts of 
different length (Figure 2). The curvature induced by short 
A-tract elements in vitro is lower than for long A-tracts 
(27), which supports the hypothesis that the DNA curva- 
ture-dependent specificity of MADS-domain TFs may be 
also important in vivo. The fact that the length of A-tract 
elements is conserved among the Arabidopsis ecotypes for 
regions bound by MADS-domain TFs also indicates the 
evolutionary importance of this structural property. 

The fact that temperature may affect differentially 
the DNA-binding affinity of particular MADS-domain 
dimers opens the door to new possibihties of how tem- 
perature can affect transcriptional regulation by MADS- 
domain TFs. Several MADS-domain TFs act in processes 
that are temperature-dependent, such as floral transition, 
flower maturation and fruit ripening (5). There is a large 
overlap among the target genes of several MADS-domain 
dimers (16). Therefore, it is tempting to speculate that 
temperature can be partially sensed by the plant via modi- 
fication of the DNA-binding affinity of various dimers 



competing for binding common regulatory regions. 
This would provide a way to activate or repress the 
downstream pathways of target genes affected by these 
regulatory regions depending on the activity of the 
dimers. A similar mechanism of temperature sensing has 
been observed in bacteria, where temperature-dependent 
changes in DNA curvature in promoter regions containing 
A-tract elements play an important role in temperature- 
controlled gene expression (40,41). In eukaryotes, the 
TATA binding protein also shows a temperature-depend- 
ent binding affinity; Kuddus et al. (42) propose that this 
could be related with the fact that TATA binding protein 
affinity is dictated by the conformational flexibility of 
its DNA target (43). Recently, Lee et al. (44) and Pose 
et al. (45) unravelled various aspects of temperature- 
dependent activity of MADS-domain SVP-FLM 
complexes, including temperature-dependent degradation 
of SVP (44) and temperature-dependent alternative 
splicing of ELM (45). Future studies need to reveal the 
biological importance of the various mechanisms of tem- 
perature-dependent binding and regulation 'in planta'. 
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